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^ | ■ We propose a method for automatically generating abstract transformers for static 

| analysis by abstract interpretation. The method focuses on linear constraints on programs 

operating on rational, real or floating-point variables and containing linear assignments 
and tests. 

In addition to loop-free code, the same method also applies for obtaining least fixed 
points as functions of the precondition, which permits the analysis of loops and recursive 



functions. Our algorithms are based on new quantifier elimination and symbolic manipu- 
lation techniques. 

Given the specification of an abstract domain, and a program block, our method 
automatically outputs an implementation of the corresponding abstract transformer. It 
is thus a form of program transformation. 
\^ | The motivation of our work is data-flow synchronous programming languages, used 

for building control-command embedded systems, but it also applies to imperative and 
functional programming. 



00 ! 1 Introduction 

p 

In program analysis, it is often necessary to prove or infer numerical properties of programs, for 
instance, in order to prove certain relationships between array indices, or to prove the absence 
of overflows. Static program analysis by abstract interpretation obtains properties of variables, 
or of relationships between variables, representable in an abstract domain. Examples of "classi- 



cal" n umerical abstract domains for numerical properties include intervals ICousot and Cousot 



19761 - to each variable x one attaches an interval [x m m , - - and convex polyhedra 



Cousot and Halbwachsl [19781 ] - - conjunctions of inequalities a\X\ + • • • + a n x n < c are in- 
ferred. 

For each implemented numerical domain and each program instruction, the static analyzer 
must provide an abstract transfer function, which maps the property before the instruction 
to a safe property after the instruction (for forward analysis; the reverse is true of backward 
analysis). For instance, over the intervals, z=x+y is optimally abstracted as z max = x max +y max 
and z m j n = x m i n + y m i n ', the transfer functions for polyhedra are more complex. While the 
designers of abstract interpreters generally strive so that the output property is "optimal" 
(the interval [z m i n , z ma _ x ] defined above is the least possible one for the inclusion ordering), 
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optimality is not preserved by composition. Consider, for instance, y=x; z=x-y; with the 
precondition that x G [0,1]. The interval for z, obtained from those for x and y by applying 
the rules of interval arithmetics, is [—1,1]; yet, the optimal interval is {0}. The reason for 
this loss of precision is that while the computation of the interval for z from those for x and 
y is locally optimal, it does not take into account the relationship between x and y. 

Our initial t arget applica t ion w as programs written in synchronous data -flow languages 



such as Lustre Caspi et al. 1987 ]. Simulink or Scade Caspi et al. 20031 ] . In these lan- 



guages, operators are built out of elementary operators, introducing many intermediate vari- 
ables. Successions of small elementary operations may also occur when a nalyzing low-level 
code, e.g. assembly Gopan and Repsl 2007 ]. Balakrishnan and Repsl 2004 ] or Java bytecode, 
and they hamper certain st atic analysis methods due to t he reduced size of the code window 
used for transfer functions Logozzo and Fahndrich 20081 ] . Analyzing floating-point code at 
the assembly level may actually be easier than analyzing higher-level programs, since the se- 
mantics of elementary floating-point operations are usually fairly well-defined while the defini- 
tion an d compiling processes of higher-level languages may leave significant leeway iMonniaux 
2008bl |. It is therefore important, for such applications, to be able to analyze program blocks 



as a whole and not as a succession of independent operations. 

In the above simple example, we could obtain better precision by using a relational abstract 
domain linking the inputs and the outputs of the procedure. In general, though, the code 
fragment may contain tests and loops (or, more generally, semantic fixed points), which 
complicates the matter (see Sec. 13.4.31 for a short example whose semantics involves a fixed 
point). 

Ideally, for better precision, the analyzer should provide a (hopefully optimal) abstract 
transfer function for each possible program block (fragment of code without loops). However, 
the designers of the analyzer cannot include a hand-coded function for each possible program 
block to be analyzed, if only because the number of possible program blocks is infinite. Also, 
the user might want to use abstract domains not pre-programmed in the analyzer. We would 
like that abstract transfer functions be obtained automatically from the definition of the 
abstract domain and the source code (or semantics) of the program block. 

In this article, we show how to automatically transform program blocks without loops 
into an effective implementation of their optimal abstract transfer function. This optimal 
transformer maps constraints on the block inputs to the tightest possible constraints on the 
block output. This transformation is parametric in the abstract domain used: it takes as 
inputs both the program block and a specification of the abstract domain, and outputs the 
corresponding transfer function. The same method applies for both forward and backward 
analysis by abstract interpretation, though, for the sake of simplicity, the article focuses on 
forward analysis. 

For short, our analysis considers the exact transition relation of loop- free program frag- 
ments as an existentially quantified formula. From that formula, it is able to compute the 
optimal abstract transformer for the fragment with respect to a user-specified abstract do- 
main, or even for the least invariant of the fragment in that abstract domain. T he user may 



specif y any abstract domain in the wide class of template linear abstract domains IColon et al 



2003]. 



Our method is based upon quantifier elimination in the theory of rational linear arithmetic. 
It has long been known that this theory admitted quantifier elimination, but algorithms 
remained mostly impractical. Recent improvements in SAT/SMT solving techniques have 
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made it possible to perform quantifier elimination on larger formulas iMonniauxj [2008al ]. 

We also show how to obtain transfer functions for loops, which are also optimal in a certain 
sense (they compute the least inductive invariant representable in the abstract domain). 

In the beginning of the article, we focus on simple forward analysis of loop-free blocks, 
then single loops (or single fixed points), for programs dealing with real or rational variables. 
The same methods apply to integer variables, at the expense of some added abstraction. 
We show in later sections how to deal with various constructions, including nested loops and 
arbitrary control- flow graphs, recursive procedures and floating-point computations. Our focus 
was indeed, originally, synchronous data-flow programs operating over real (for modeling) or 
floating-point (for execution) variables, but we realized that the same technique could apply 
to a wider spectrum of languages. 



O ur ana lysis goes further than most constraint-based static analysis I Sankaranarayanan et al 



20051 . 120041 ] in that it computes the general form of the optimal postcondition or least induc- 
tive invariant as a function of the precondition parameters, not just for specific values of those 
parameters. For a simple example, if the procedure is invoked on the interval domain and the 
z := x + y operation, our transformation outputs z min : = ^min + ymin and z max := £ max + y max . 
This is especially important since the function mapping the input parameters to the output 
parameters may be non convex (a simple example is the abstraction of the absolute value 
with respect to intervals from Sec. 13. 2p . 

In the above case, the abstract transfer function is linear, but in general it is only piecewise 
linear. It can be expressed as a simple executable program, consisting only of tests and 
assignments (see an example at the end of Sec. 13. 2p . The analysis thus amounts to a program 
transformation from the concrete to the abstract program. An advantage of obtaining the 
abstract transfer functions in such a form is that it can be compiled as an ordinary program 
and loaded back into the analyzer for maximal efficiency. The abstract transfer function 
obtained by the analysis of a block may be retained for future use, since it is valid in any 
context. An application of our transformation is therefore modular interprocedural analysis. 

We have so far considered analyzes where the constraints apply to program variables 
at a given control point. It is also possible to consider relationships between variables at 
two different control points, especially the entry and exit of procedures. This way, we can 
also analyze programs with recu r sive procedures, including the famous McCarthy 91 func- 



tion 



analyze programs witn recu r sive procedures, including 
Manna and McCarthy! |l969( |. iManna and Pnueiil [l97CI ]. 



Contrary to most analyzes of numerical properties based on abstract interpretation, our 
analysis for loops does not use widening operators for finding over-approximations of least 
fixed points. For instance, the set of reachable states at the start of a loop (a loop invariant) 
is expressed as the least fixed point of the transition relation that contains the input precon- 
dition. In widening-based analyzes, over-approximations of the set of reachable states after 
1, 2, 3, etc. loop iterations are computed, and the analyzer tries to extrapolate these results 
in order to obtain some "candidate" for being a loop invariant. For instance, an abstract 
analyzer based on intervals may obtain [1, 2], [1, 3], [1, 5], and, because the lower bound of the 
interval stays stable and the upper bound is unstable, may try [l,+oo[. If [l,+oo[ is stable 
under the transition relation, then it is a safe invariant, otherwise further widening is needed. 
Widenings are a major source of imprecision in many static analyzers and their design is 
somewhat of a "black art" . While the soundness of the transition relation and the stability 
test ensure that the analysis results are correct, and the correct construction of the widening 
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operator ensures termination, the quality of the over-approximation obtained (whether it is 
close to the actual least invariant or far from it) depends on various factors. In contrast, our 
method is guaranteed to yield least inductive invariants. 

In Sec. [2l we recall facts of formulas built out of linear inequalities. In Sec. 13.11 we define 
the class of abstract domains that we consider. In Sec. 13.21 we show how we obtain optimal 
abstract transformers as logical formulas, and in Sec. 13.31 how to compile these formulas into 
executable functions. In Sec. 13.41 we show how the same process applies to least inductive 
invariants. In Sec. H] we show how to deal with various extensions to the admissible domains 
and operations: how to allow infinite values for constraint parameters, how to allow some 
class of non-convex domains, how to partition the state space, and how to model floating- 
point computations using real numbers. In Sec. [5] we shall see how to deal with recursive 
procedures and arbitrary control- flow graphs. 



2 Linear formulas 

We consider logical formulas built out of linear inequalities. A linear expression is a sum 
a\V\ + • • • + a n v n where the 6 Q and the Vi are variables. Q denotes the field of rational 
numbers, R the field of real numbers. A linear inequality is of the form I > or I > 0, where 
I is a linear expression. Linear inequalities can always be scaled so that they use only integer 
coefficients, as opposed to rationals. a < b < c is shorthand for a < b A b < c. Unquantified 
formulas are built out of atomic formulas (linear inequalities) using logical connectives A 
and V. I = means I > OA / < 0. A formula is said to be in disjunctive normal form (DNF) if 
it is written as a disjunction C\ V- • • VC n , where each of the C{ is a conjunction A^iA- • -AAi inj 
where the Aij are atomic formulas or negations thereof. Quantified formulas are built out of 
the same, plus the universal and existential quantifiers V and 3. 

The Q-models (respectively, M-models) of a formula F are mappings m from the free 
variables of F to Q (respectively, R) such that m verifies the formula; we then note m |= F. 
F is said to be true if every assignment is a model (a model is a mapping from the set of 
variables to Q or R), satisfiable if it has a model, and false or unsatisfiable otherwise. Truth 
and satisfiability are equivalent if F has no free variables. 

We say that two formulas F and G with the same free variables are equivalent, noted 
F = G, if they have the same models. Any formula is equivalent to a formula in disjunctive 
normal form, which can be obtained by repeated application of distributivity: a A (b V c) = 
(a A b) V (a A c). F is said to imply G, noted F ^ G, if all models of F are models of G. We 
say that F and G are equivalent modulo assumptions T, noted F =t G, if F A T = G A T; 
we define similarly F G as F AT ^ G AT '. Equivalences modulo assumptions are often 
used when simplifying formulas. For instance, if we know that a certain program is always 
used in a context where T = a < b holds, and program analysis, at some point, generates the 
formula F = 3x a < x < b, then this formula can be simplified to G = true. 

The theory of linear inequalities admits quantifier elimination: for any formula F with 
quantifiers, there exists a formula G without quantifiers such that G = F . There exist several 



algorithms that com pute such a G from F. iFerrante and Rackofil 19751 ] proposed a doubly 
exponential method Bradley and Manna . 20071 . Sea 7.3], which is too slow in practice; we 



have since proposed another alg orithm tha t takes advantage of the recent improvements in 
satisfiability testing technology. iMonniaux |2008al ] Our algorithm also allows conversion to 
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disjunctive normal form, and formula simplification modulo assumptions. 



3 Optimal Abstraction over Template Linear Constraint Do- 
mains 

3.1 Template Linear Constraint Domains 

Let F be a formula over linear inequalities. We call F a domain definition formula if the free 
variables of F split into n parameters pi, . . . ,p n and m state variables si, . . . , s m . We note 
j F : Q n -> V(Q m ) defined by j F (p) = {s £ Q m \ (p, s) \= F}. As an example, the interval 
abstract domain for 3 program variables si, S2, S3 uses 6 parameters mi, Mi, m,2, M2, 7713, M3; 
the formula is mi < si < Mi A mi < S2 < M2 A m.3 < S3 < M3. 

In this section, we focus on the case where F is a conjunction Li(si, . . . , s m ) < pi A 
• • • A L n (s 1, . . . , s m ) < p n of linear inequalities whose left-hand side is fixed and the right- 
hand sid es are parameters. Such conjunctions define the class of template linear constraint 
domains Colon et al. [2003]. Particular examples of abstract domains in this class are: 



the intervals (for any variable s, consider the linear forms s and — s); 

the difference bound matrices (for any variables si and S2, consider the linear form 
si - s 2 ); 

the octagon abstract domain (for any variables si and S2, distinct or not, consider the 
linear forms ±si ± s 2 ) Minel 2001 ] 



the octahedra (for any tuple of va riables si, . . . ,s n , consider the linear forms ±si • • • db 



s n ). IClariso and Cortadellal j2004 | 



Remark that 7^ is in general not injective, and thus one should distinguish the syntax of 
the values of the abstract domain (the vector of parameters p) and their semantics jf(J>)- As 
an example, if one takes F to be si < P1AS2 < P2AS1+S2 < P3, then both (pi,P2,P3) = (1, 1, 2) 
and (1,1,3) define the same set for state variables si and S2- If u < v coordinate-wise, then 
1f(u) C 7_p(v), but the converse is not true due to the non-uniqueness of the syntactic form. 

Take any nonempty set of states W C Q m . Take for all i = 1, . . . , m: Pi = sup^ gW Li(s). 
Clearly, W C 7f(pi, • • • ,Pm), and in fact p is such that jf(p) is the least solution to this 
inclusion, pi belongs in general to 1U {+00}, not necessarily to Q U {+cxd}. (for instance, 
if W = {si I s\ < 2} and L\ = s±, then p\ = \/2). We have therefore defined an aj? : 
"P(M m ) — > {_L} U(lU {+oo}) n , and (uf,^f) form a Galois connection: olf maps any set to 
its best upper-approximation. The fixed points of 0^07^ are the normal forms. For instance, 
si < 1 A S2 < 1 A si + S2 < 2 is in normal form, while si < 1 A S2 < 1 A si + S2 < 3 is not. 



3.2 Optimal Abstract Transformers for Program Semantics 

We shall consider the input-output relationships of programs with rational or real variables. 
We first narrow the problem to programs without loops and consider programs consisting 
in linear arithmetic assignments, linear tests, and sequences. Noting a,b, . . . the values of 
program variables a, b . . . at the beginning of execution and a', b', . . . the output values, the 
semantics of a program P is defined as a formula \P\ such that (a, b, ... , a', b', . . . ) |= P if 
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and only if the memory state (a', b', . . .) can be reached at the end of an execution starting 
in memory state (a, b, . . . ): 

Arithmetic [a := L(a, b, . . . ) + K} F = a' = L(a, b,...) + KAb' = bAd = cA... where K 
is a real constant and L is a linear form, and b,c,d. . . are all the variables except a; 

Tests [if c then p\ else p 2 ] = (c A |pi]jr) V {->c A [p 2 ]f); 

Non deterministic choice [a := random] = b' = b A d = c A . . . , for all variables except a; 

Failure [fail] = false; 

Skip [skip]] =a' = a/\b' = b/\d = c/\... 

Sequence [Pi; P 2 ]f = 3a", 6", ... /i A/2 where /1 is \Pi}f where a' has been replaced by 
a", b' by b" etc., / 2 is [P 2 ]f where a has been replaced by a", b by 6" etc. 

In addition to linear inequalities and conjunctions, such formulas contain disjunctions (due 
to tests and multiple branches) and existential quantifiers (due to sequential composition). 

Note that so far, we have represented the concrete denotational semantics exactly. This 
representation of the transition relation using existentially quantified formulas is evidently as 
expressive as a representation by a disjunction of convex polyhedra (the latter can be obtained 
from the former by quantifier elimination and conversion to disjunctive normal form), but is 
more compact in general. This is why we defer quantifier elimination to the point where we 
compute the abstract transfer relation. 

Consider now a domain definition formula F = Li(s±, s 2 , . . . ) < p\ A ■ ■ ■ AL n (si, s 2 , . . . ) < 
p n on the program inputs, with parameters p and free variables s, and another F' = L'^s^, s' 2 , . . . ) < 
p[ A - • • AL' n (si, s' 2 , ■ ■ ■ ) < p' n on the program outputs, with parameters p' and free variables s'. 
Sound forward program analysis consists in deriving a safe post- condition from a precondition: 
starting from any state verifying the precondition, one should end up in the post-condition. 
Using our notations, the soundness condition is written 

Vs,P F A [P] F' (1) 

The free variables of this relation are p and p'\ the formula links the value of the parameters 
of the input constraints to admissible values of the parameters for the output constraints. 
Note that this soundness condition can be written as a universally quantified formula, with 
no quantifier alternation. Alternatively, it can be written as a conjunction of correctness 
conditions for each output constraint parameter: C[ = Vs, s' F A [P] => L'^s') < p\. 

Let us take a simple example: if P is the program instruction z := x + y, F = x < 
Pi A y < P2, F' = z < p[, then [P] = z' = x + y, and the soundness condition is Vx, y, z (x < 
P\Ay<p2Az = x + y z < p[). Remark that this soundness condition is equivalent 

to a formula without quantifiers p[ > p\ + P2, which may be obtained through quantifier 
elimination. Remark also that while any value for p[ fulfilling this condition is sound (for 
instance, p\ = 1000 for p\ = p 2 = 1), only one value is optimal (pi = 2 for p\ = p 2 = 1). An 
optimal value for the output parameter p\ is defined by 0\ = C[ A Vg^ (C'^q^/p'^ p\ < q^). 
Again, quantifier elimination can be applied; on our simple example, it yields p[ = p\+ P2- 
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If there are n input constraint parameters pi, . . . ,p n , then the optimal value for each output 
constraint parameter p'- is defined by a formula 0[ with n+1 free variables pi, . . . ,p n ,Pi- This 
formula defines a partial function from Q n to Q, in the mathematical sense: for each choice 
of px, ... ,p n , there exist at most a single p[. The values of pi, . . . ,p n for which there exists 
a corresponding p\ make up the domain of validity of the abstract transfer function. Indeed, 
this function is in general not defined everywhere; consider for instance the program: 

if (x >= 10) { y = random; } else { y = 0; } 

If F = x < pi and F' = y < p±, then 0[ = p\ < 10 A p± = 0, and the function is defined only 
for pi < 10. 

At this point, we have a characterization of the optimal abstract transformer corresponding 
to a program fragment P and the input and output domain definition formulas as n formulas 
(where n is the number of output parameters) 0[ each defining a function (in the mathematical 
sense) mapping the input parameters p to the output parameter p' { . 

Another example: the absolute value function y := \x\, again with the interval abstract 
domain. The semantics of the operation is(x>0Ay = x)V(x<0Ay = —x); the precondition 
is x £ [x min , x max ] and the post-condition is y £ [y min , y max ]. Acceptable values for (y min , y max ) 
are characterized by formula 

G — Vx Xmin ^ X ^ X max )' y m i n ^ \x\ ^ 2/max (2) 

The optimal value for y max is defined by GA Vy max G[y max /y max ] =^ y max < y max . Quantifier 
elimination over this last formula gives as characterization for the least, optimal, value for 

2/max- 

(^min ~\~ X max ^ A £/max — ^max)V 

(^min 2-max < A 2/max — ^min)- (3) 

We shall see in the next sub-section that such a formula can be automatically compiled into 
code such as: 

if (xmin + xmax >= 0) { 

ymax = xmax; 
} else { 

ymax = -xmin; 

} 

3.3 Generation of the Implementation of the Abstract Domain 

Consider formula [31 defining an abstract transfer function. On this disjunctive normal form 
we see that the function we have defined is piecewise linear: several regions of the range of 
the input parameters are distinguished (here, x m \ n + x max < and x m \ n + x max > 0), and 
on each of these regions, the output parameter is a linear function of the input parameters. 
Given a disjunct (such as y max = — x mm A x m \ n + x max < 0), the domain of validity of the 
disjunct can be obtained by existential quantifier elimination over the result variable (here 
max (?/max — ^-min Ax mm +x max < 0)). The union of the domains of validity of the disjuncts 
is the domain of validity of the full formula. The domains of validity of distinct disjuncts can 
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overlap, but in this case, since 0[ defines a function in the mathematical sense, the functions 
defined by such disjuncts coincide on their overlapping domains of validity. 
This suggests a first algorithm for conversion to an executable form: 

1. Put 0\ into quantifier-free, disjunctive normal form C\ A • • • A C n . 

2. For each disjunct Cj, obtain the validity domain as a conjunction of linear inequalities 
and solve for p\ (obtain p[ as a linear function v i of the pi, . . . ,p n ). 

3. Output the result as a cascade of if-then-else and assignments, as in the example at the 
end of Sec. 13.21 



Algorithm 1 ToITEtree(F, z, T): turn a formula defining z as a function of the other free 
variables of F into a tree of if-then-else constructs, assuming that T holds. 

D(= Ci A • • • A C n ) <- QElimDNFModulo({}, F, T) 

for all d e D do 

Pi <- QElimDNFModulo(z,F,T) 

end for 

P <— Predicates (Pi, ... ,P n ) 
if P = then 
Ensure: 3z F is always true 
O «- Solve(D, z) 
else 

K <- Choose(P) 

O «- lfThenElse(Er, ToITEtree(F, z,T A K), ToITEtree(F, z,T A ->K)) 
end if 



An if-then-else cascade may be inefficient, since identical conditions may have to be tested 
several times. We could of course factor out all conditions and assign them to Boolean 
variables, but then, some of the tests performed may actually not be needed. We therefore 
propose an algorithm for building an if-then-else tree. The idea of the algorithm is as follows: 

• Each path in the if-then-else tree corresponds to a conjunction C of conditions (if one 
goes through the "if" branch of if (a) and the "else" branch of if (b) , then the path 
corresponds to a A -ib). 

• The formula 0[ is simplified relatively to C, a process that prunes out conditions that 
are always or never satisfied when C holds. 

• If the path is deep enough, then the simplified formula becomes a conjunction. One 
then solves this conjunction to obtain the computed variable (here, y m ax) as a function. 

Our algorithm ToITEtree(F, z, T) (Alg.QJ uses a function QElimDNFModulo^, F, T) 
that, given a possibly empty vector of variables v, a formula F and a formula T, outputs a 
quantifier-free formula F' in disjunctive normal form such that F' =%> 3v F and no useless 
predicates are used. Predicates(F) returns the set of atomic predicates of F. Solve(Z), z) 
solves a minimal disjunction D of inequalities for variable z, assuming that there is at most 
one solution for z for each choice of the other variables; one simple way to do that is to look 
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for any constraint of the form z > L or z < L and output z = L. Choose(P) chooses any 
predicate in P (one good heuristic seems to be to choose the most frequent in Pi, ... , P n ). 

Let us take, as a simple example, formula [3l We wish to obtain y max as a function of 
x m i n and x mSLX , so in the algorithm ToITEtree we set z = y max . C\ is the first disjunct 
£min + x max > A y max = x max , C 2 is the second disjunct x min + x max < A y max = -x min . 
We project C\ and C2 parallel to y max , obtaining respectively Pi = (x m ; n + x max > 0) and 
P2 = (a^min + s max < 0). We choose K to be the predicate x m i n + x max > (in this case, the 
choice does not matter, since Pi and P2 are the negation of each other). 

• The first recursive call to ToITEtree is made in the context of T = (xmin + ^max > 0). 
Obviously, F A T = (y max = x max ) A T and thus (3y max P) A T = T. 

QELiMDNFMODULO(y max , F,T) will then simply output the formula "true". It then 
suffices to solve for y max in y max = x max . This yields the formula for computing the 
correct value of y max in the cases where x m \ n + x max > 0. 

• The second recursive call is made in the context of T = {x m \ n + x max < 0. The result is 
2/max = —Xjami the formula for computing the correct value of y max in the cases where 

^min "I" ^max ^ 0. 

These two results are then reassembled into an if-then-else statement, yielding the program 
at the end of $T21 

The algorithm terminates because paths of depth d in the tree of recursive calls correspond 
to truth assignments to d atomic predicates among those found in the domains of validity of 
the elements of the disjunctive normal form of F. Since there is only a finite number of such 
predicates, d cannot exceed that number. A single predicate cannot be assigned truth values 
twice along the same path because the simplification process in QElimDNFModulo erases 
this predicate from the formula. 

3.4 Least Inductive Invariants 

We have so far considered programs without loops. We shall now see that not only can we 
compute the optimal abstract post-condition of a block as a simple, executable function of the 
parameters of the precondition, but we can also compute the parameters of the least inductive 
invariant of a program block that is of the form specified by the abstract domainQ Beware 
that this least inductive invariant found in the abstract domain is in general different from the 
least element of the abstract domain that includes the least inductive invariant of the system 
(Fig-U). 

3.4.1 Stability Inequalities 

Consider a program fragment: while (c) { p; }. We have domain definition formulas 
F = L\(si, . . . , s rn ) < pi A • • • A L n (si, . . . , s m ) < p n for the precondition of the program 
fragment , and F' = L^(si, . . . , s m ) < p[ A • • • A L' n (si, . . . , s m ) < p' n for the invariant. 

1 In order to specify the least invariant, we would have to quantify over all sets of states, then filter those 
which are inductive invariants. This is second-order quantification, which we cannot handle. By restricting 
ourselves to invariants of a certain shape, we replace it by first order quantification. 
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Figure 1: The least fixed point representable in the domain (lfp (ao/07)) is not necessarily 
the least approximation of the least fixed point (a(lfp /)) inside the abstract domain. For 
instance, if we take a program initialized by x £ [— 1, 1] and y = 0, and at each iteration, 
we rotate the point by 45°, the least invariant is an 8-point star, and the best approximation 
inside the abstract domain of intervals is the square [— 1, l] 2 . However, this square is not an 
inductive invariant: no rectangle (product of intervals) is stable under the iterations, thus 
there is no abstract inductive invariant. 
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Define G = [c] A [p]. G is a formula whose free variables are s±, . . . , s m , s^, . . . , s' m such 
that (s i, . . . , s m , s'i, . . . , s' m ) |— G if and only if the state (s' l5 . . . , s' m ) can be reached from 
the state (si, . . . , s m ) in exactly one iteration of the loop. A set W C (Q> m is said to be an 
inductive invariant for the head of the loop if Vs G W, Vs' (s, s') |= G s' £ W. We seek 
inductive invariants of the shape defined by F', thus solutions for pi of the stability condition: 

Vs, s' F' AG =^ F'[s'/s\. (4) 

Not only do we want an inductive invariant, but we also want the initial states of the 
program to be included in it. The condition then becomes 

H = (Vs, F F') A (Vs, s' F' A G => F'[s'/s\) (5) 

This formula links the values of the input constraint parameters p±,...,p n to acceptable 
values of the invariant constraint parameters p[, . . . ,p' n . In the same way that our soundness 
or correctness condition on abstract transformers allowed any sound post-condition, whether 
optimal or not, this formula allows any inductive invariant of the required shape as long as it 
contains the precondition, not just the least one. 

The intersection of sets defined by p\ and p' 2 is defined by min^'^p^). More generally, 
the intersection of a family of sets, unbounded yet closed under intersection, defined by p' £ Z 
is defined by min{p' \ p' £ Z}. We take for Z the set of acceptable parameters p' such that p' 
defines an inductive invariant and Vs, F =>■ F'; that is, we consider only inductive invariants 
that contain the set I = {s\ F} of precondition states. 

We deduce that p\ is uniquely defined by: p\ = min(3p' 1 , . . . . . . ,p' n H) which 

can be rewritten as 

(3pi, . . . ,ti-i,Pi+i, ■■■,PnH)A (V^ H[q'/p'\ p\ < (6) 

The free variables of this formula are pi, . . . ,p n ,p'i- This formula defines a function (in the 
mathematical sense) defining p\ from pi, ■ ■ ■ ,p n - As before, this function can be compiled to 
an executable version using cascades or trees of tests. 

3.4.2 Simple Loop Example 

To show how the method operates in practice, let us consider first a very simple example 
(somethingJiappens is a nondeterministic choice): 

int i=0; 

while (i <= n) { 

if (somethingJiappens) { 
i=i+l; 

if (i == n) { 
i=0; 

} 

} 

> 
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Let us abstract i at the head of the loop using an interval [i m i n , imax]- F° r simplicity, 
we consider the case where the loop is at least entered once, and thus i = belongs to 
the invariant. For better precision, we model each comparison x ^ y over the integers as 
x >= y + 1 V x <= y — 1; similar transformations apply for other operators. The formula 
expressing that such an interval is an inductive invariant is: 



min < A < i max A ViVi' ((i min < i Ai < i max A 

(((i + l<n-lVi + l>n+l)Ai' 
(i + 1 = n + 1 A i' = 0) V i' = i)) 

Quantifier elimination produces: 



» + l)V 

(*min ^ i A i ^ 2 max )) (7) 



(«min < A i max > A i mBX < n A -i min + n - 2 < 0)V 

(^min ^ A imax ^ A i r 

The formulas defining optimal i m \ n and i max are: 



n + 1 > A i max < n) (8) 



> A imin < A n > 



ft 



A An > A n < 2) V (i r 



n — 1 Ai r 



>1) 



(9) 
(10) 



We note that this invariant is only valid for n > 0, which is unsurprising given that we 
specifically looked for invariants containing the precondition i = 0. The output abstract 
transfer function is therefore: 

if (n <= 0) { 

failQ; 
} else { 
iMin = 0; 
if (n < 2) { 

iMax = 0; 
} else /* n >= 2 */ 
iMax = n-1; 

> 

} 

The case disjunction n < 2 looks unnecessary, but is a side effect of the use of rational 
numbers to model a problem over the integers. The resulting abstract transfer function 
is optimal, but on such a si mple case, o ne could ha ve obtained the same using polyhedra 
Cousot and Halbwachs 19781 ] or octagons Minel 2001 ]. 

Let us now consider the same program, simply replacing n by the constant 20. All imple- 
mentations of intervals (and thus of octagons and polyhedra, since we only have one variable), 
will overshoot the z max = 19 target when using the traditional widening and narrowing strate- 
gies: they will compute i G [0,0], then £ [0, 1], E [0,2] and widen to [0, +oo[, and narrowing 
will not reduce the interval. Even if we replaced i == 20 by i >= 20, narrowing would still 
fail to reduce the interval due to the nondeterministic choice since the concrete transfer func- 
tion /, mapping sets of states at the head of the loop to sets of states at the next iteration, is 
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expansive: for all set of states W, W C f(W). This is a well-known weakness of the widen- 
ing/narrowing approach, and the workaround is a syntactic trick known as widening up to or 
widening with thresholds: fo r all variables, the con stants to which it is compared are gathered 
and used as widening steps Blanchet et al. . 20031 . Sec. 7.1.2]. This syntactic approach fails if 
tests are more indirect, whereas our semantic approach is not affected. 



3.4.3 Synchronous Data Flow Example: Rate Limiter 

To go back to the original problem of floating-point data in data-flow languages, let us consider 
the following library block: a rate limiter. When compiled into C, such a block in inserted in 
a reactive loop, as shown below, where assume (c) stands for if (c) {} else {failO ;}: 

while (true) { 



assume (el >= elmin 
assume (e2 >= e2min 
assume (e3 >= e3min 



el = random () 
e2 = random () 
e3 = random () 
oldsl = si; 
if (random) { 

si = e3; 
} else { 

if (el - oldsl < -e2) { 
si = oldsl - e2; 

} 

if (el - oldsl > e2) { 
si = oldsl + e2; 

} 



el <= elmax) ; 
e2 <= e2max) ; 
e3 <= e3max) ; 



} 

We are interested in the input-output behavior of that block: obtain bounds on the output 
si of the system as functions of bounds on the inputs (el, e2, e3). Note that in this case, si, 
el, e2, e3 are streams, not single scalars. One difficulty is that the si output is memorized, 
so as to be used as an input to the next computation step. The semantics of such a block is 
therefore expressed as a fixed point. 

We wish to know the least inductive invariant of the form si m ; n < s± < si max under the 
assumption that ei min < ei max Ae2 m i n < e2 max Ae3 min < e3 max . The stability condition yields, 
after quantifier elimination and projection on si max the condition si max > ei max A si max > 
e 3max- Minimization then yields an expression that can be compiled to an if-then-else tree: 

if (elmax > e3max) { 

slmax = elmax; 
} else { 

slmax = e3max; 

} 
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This result, automatically obtained, coincides with the intuition that a rate limiter (at 
least, one implemented with exact arithmetic) should not change the range of the signal that 
it processes. This program fragment has a rather more complex behavior if all variables and 
operations are IEEE-754 floating-point, since rounding errors introduce slight differences of 
regimes between ranges of inputs (Sec. 14.41 [6]). Rounding errors in the program to be analyzed 
introduce difficulties for analyzes using widenings, since invariant candidates are likely to be 
"almost stable" , but not truly stable, because of thes e errors. Again, there exist workarounds 
so that widening-based approaches can still operate Blanchet et al. . 20031 . Sec. 7.1.4]. 



4 Extensions to the Admissible Domains and Operations 

The class of domains and program constructs of the preceding section may seem too limited. 
We shall see here a few extensions. 



4.1 Infinities 

Consider the interval abstract domain, defined by x < |?2 A — x < p\. The techniques explained 
in Sec. 13.11 allow only finite bounds. Yet, it makes sense that p% and p2 could be equal to 
+oo so as to represent infinite intervals. This can be easily achieved by a minor alteration 
to our definitions. Each parameter pi is replaced by two parameters p\ and p°° . p°° is 
constrained to be in {0, 1} (if the quantifier elimination procedure in use allows Boolean 
variables, then p^° can be taken as a Boolean variable); pf^ = means that pi is finite and 
equal to p\, pf = 1 means pi = +oo. Li < becomes (p°° > 0) V (Li < L, < pi becomes 
(pf > 0) V (Li < p\). After this rewriting, all formulas are formulas of the theory of linear 
inequalities without infinities and are amenable to the appropriate algorithms. 



4.2 Non-Convex Domains 

Section [3.11 constrains formulas to be conjunctions of inequalities of the form Lj < pj. What 
happens if we consider formulas that may contain disjunctions? 

The template linear constraint domains of section 13.11 have a very important property: 
they are closed under (infinite) intersection; that is, if we have a family p 6 W, then there 
exist po such that C\^ & \y^f(p) = 1f(po) (besides, po = inf{p | p G W}). This is what enables 
us to request the least element that contains the exact post-condition, or the least inductive 
invariant in the domain: we take the intersection of all acceptable elements. 

Yet, if we allow non-convex domains, there does not necessarily exist a least element 
1f(p) such that S C ^f(p)- Consider for instance S = {0,1,2} and F representing unions 
of two intervals ((— x < p\ A x < P2) V (— x < p^ A x < p^)) A P2 < P3- There are two, 
incomparable, minimal elements of the form 7_f(p) : Pi = Vi = A P3 = —1 A p^ = 2 and 
Pi = A p 2 = 1 A p 3 = -2 A p A = 2. 

We consider formulas F built out of linear inequalities Lj(si, . . . , s n ) < pi as atoms, 
conjunctions, and disjunctions. By induction on the structure of F, we can show that : 
(R U {— oo}) n — ► V(M. n ) is inf-continuous; that is, for any descending chain (pi)i^i such that 
\\m.iPi = poo, then "fF(Pi) is decreasing and C\ iG j 'yFipi) = 7f(p*oo)- The property is trivial for 
atomic formulas, and is conserved by greatest lower bounds (A) as well as binary least upper 
bounds (V). 
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Let us consider a set S C V(M. n ), stable under arbitrary intersection (or at least, greatest 
lower bounds of descending chains). S can be for instance the set of invariants of a relation, or 
the set of over- approximations of a set W. 7^ 1 (£) is the set of suitable domain parameters; for 
instance, it is the set of parameters representing inductive invariants of the shape specified by 
F, or the set of representable over-approximations of W. 7^ 1 (S') is stable under greatest lower 
bounds of descending chains: take a descending chain (pi)i£i, then 7i?(limjpj) = C\ijF(Pi) G S 
by inf-continuity and stability of S. By Zorn's lemma, i jp 1 (S) has at least one minimal 
element. 

Let P\p\ be a formula representing 7^ (5) (Sec. 13.11 proposes formulas defining safe post- 
conditions and inductive invariants). The formula G\p\ = P\p\ A V// P[p'] Ap' < p ==>- p < p' 
defines the minimal elements of 7 _1 (5). 

For instance, consider p = (a,b,c,d), F = {—x < a A x < b) V (—x < c A x < d), 
representing unions of two intervals [—a, 6] U [— c, d]. We want upper-approximations of the 
set {0, 1, 3}; that is P[p\ =Va;(x = 0Vx = lV3; = 3 =>• F\p, x]). We add the constraint 
that —a < b A b < — c A — c < d, so as not to obtain the same solutions twice (by exchange 
of (a, b) and (c, d)) or solutions with empty intervals. By quantifier elimination over G, we 
obtain ( a = 0A6=lAc=-3Ad = 3)V(a = 0A6 = 0Ac = -lAd = 3), that is, either 
[0,0] U [1,3] or [0,1] U [3,3]. 



4.3 Domain Partitioning 

Non-convex domains, in general, are not stable under intersections and thus "best abstraction" 
problems admit multiple solutions as minimal elements of the set of correct abstractions. 
There are, however, non-convex abstract domains that are stable under intersection and thus 
admit least elements as well as the template linear constraint domains of Sec. 13.11 those 
defined by partitioning of the state space. Consider pairwise disjoint subsets (Cj)jg/ of the 
state space Q m , and abstract domains stable under intersection (5j)j g /, Si C V{Ci). Elements 
of the partitioned abstract domain are unions (Jie/ Si wnere Sj G Si. If (IJj s i,j])j^j 1S a family 
of elements of the domain, then fljej (Uie/ s i,jT) = Uie/ HjeJ s i,j'i ^ na * 1S ' intersections are 
taken separately in each Cj. 

Take a family (Fi[p\)i^j of formulas defining template linear constraint domains (con- 
junctions of linear inequalities L«(si, . . . , s n ) < pi) and a family {Ci)i & i of formulas such 
that for all i and i' , Ci A CV is equivalent to false and C\ V • • • V C\ is equivalent to true. 
F = (Ci A F\) V • • • V (Cj A Fi) then defines an an abstract domain such that 7^ is a inf- 
morphism. All the techniques of Sec. 13.11 then apply. 



4.4 Floating-Point Computations 



Real-life programs do not operate on real numbers; they operate on fixed-point or floating- 
point numbers. Floating point operations have few of the good algebraic properties of real 
operations; yet, they constitute approximations of these real operations, and the rounding 
error introduced can be bounded. _ 

for 



In IEEE floating-point llEEl [19851 ]. each atomic operation (noting 



f 



operations so as to distinguish them from the operations +, — , X, /, over the reals) is 
mathematically defined as the image of the exact operation over the reals by a rounding 
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functiono This rounding function, depending on user choice, maps each real x to the nearest 
floating-point value r n (x) (round to nearest mode, with some resolution mechanism for non 
representable values exactly in the middle of two floating-point values), r_ co (x) the greatest 
floating-point value less or equal to x (round toward — oo), r +00 (x) the least floating-point 
value greater or equal to x (round toward +00), r$(x) the floating-point value of the same 
sign as x but whose magnitude is the greatest floating-point value less or equal to \x\ (round 
toward 0). If x is too large to be representable, r(x) = ±00 depending on the size of x 

The semantics of the rounding operation cannot be exactly represented inside the theory of 
linear inequalities]! As a consequence, we are forced to use an axiomatic over-approximation 
of that semant ics: a formula linking a real number x to its rounded version r(x). 

Minel |2004} ] uses an inequality \r(x) — x\ < e re \ -\x \ +e a b s , where e m \ is a relative error and 
e a bs is an absolute error, leaving aside the problem of overflows. The relative error is due to 
rounding at the last binary digit of the significand, while the absolute error is due to the fact 
that the range of exponents is finite and thus that there exists a least positive floating-point 
number and some nonzero values get rounded to zero instead of incurring a relative error. 

Because our language for axioms is richer than the interval linear forms used by Mine, we 
can express more precise properties of floating-point rounding. We recall briefly the character- 
istics of IEEE-754 floating-point numbers. Nonzero floating point numbers are represented as 
follows: x = ±s.m where 1 < m < 2 is the mantissa or significand, which has a fixed number 
p of bits, and s = 2 e the scaling factor (E m { n < e < I? max is the exponent). The difference 
introduced by changing the last binary digit of the mantissa is ±s.ei ast where £i as t = 2~( p ~ 1 ); 
the unit in the last place or ulp. Such a decomposition is unique for a given number if we 
impose that the leftmost digit of the mantissa is 1 — this is called a normalized representation. 
Except in the case of numbers of very small magnitude, IEEE-754 always works with nor- 
malized representations. There exists a least positive normalized number m norma i and a least 
positive denormalized number mdenormah and the denormals are the multiples of monomial 
less than m nor mai- All representable numbers are multiples of mdenormai- 

Consider for instance floating-point addition or subtraction x = ±a ± b. Suppose that 
< x < 

m normai- a an d b are multiples of TTMenormai and thus a — b is exactly represented as a 
denormalized number; therefore r(x) = x. If x > m norma i, then \r(x) — x\ < e re \.x. The cases 
for x < are symmetrical. We can therefore characterize r(x) — x using linear inequalities 
through case analysis over x: Round + (a @b,a + b) (respectively, Round + (a Q b, a — b)) holds, 
where 

Round + (r, x) = (x < m norma i A r = x) 

V (x > m norma i A -e re \.x < r - x < e re \.x (11) 

2 We leave aside the peculiarities of some implementations, such as those of most C compilers over the 32-bit 
Intel platform where there are "extended precisi ons" types used for some temporary variables and expressions 
can undergo double rounding. iMonniauxl [2008b| ] 

3 To be pedantic, since IEEE floating-point formats are of a finite size, the rounding operation could be 
exactly represented by enumeration of all possible cases; this would anyway be impossible in practice due to 
the enormous size of such an enumeration. 
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Round(r, x) = (x = A r = 0) V 

(x>0Ar>0A Round + (r, s))V 

(x < Ar < 0ARound + (-r, -x)) (12) 

To each floating-point expression e, we associated a "rounded-off" variable r e , the value of 
which we constrain using Round(r e , e) or Round + (r e , e). For instance, a expression e = a © 6 
is replaced by a variable r e , and the constraint Round + (r e ,a + 6) is added to the semantics. 
In the case of a compound expression e = ab + c, we introduce ei = a&, and we obtain 
Round+(r e ,r ei + c) A Round(r ei ,ab). If we know that the compiler uses a fused multiply-add 
operator, we can use Round(r e , ab + c) instead. 



5 Complex control flow 

We have so far assumed no procedure call, and at most one single loop. We shall see here 
how to deal with arbitrary control flow graphs and call graph structures. 



5.1 Loop Nests 

In Sec. 13.41 we have explained how to abstract a single fixed point. The method can be 
applied to multiple nested fixed points by replacing the inner fixed point by its abstraction. 
For instance, assume the rate limiter of Sec. 13.4.31 is placed inside a larger loop. One may 
replace it by its abstraction: 

if (elmax > e3max) { 

slmax = elmax; 
} else { 

slmax = e3max; 

} 

assume(sl <= slmax); 

/* and similar for slmin */ 



Alternatively, we can extend our framework to an arbitrary control flow graph with nested 
loops, the semantics o f which is expressed as a single fixed point. We may use the same 



method as proposed bv lGulwani et al. 12008, 8 2] and other authors. First, a cut set of program 



locations is identified; any cycle in the control flow graph must go through at least one program 
point in the cut set. In widening-based fixed point approximations, one classically applies 
widening at each point in the cut set. A simple method for choosing a cut set is to include 
all targets of back edges in a depth-first traversal of the control-flow graph, starting from the 
start node; in the case of structured program, this amounts to choosing the he ad node of each 



loop. This i s not necessarily t he best choice with respect to precision, though [Gulwani et al 



20081 . §2.3]; Bourdoncle 19921 . Sec. 3.6] discusses methods for choosing such as cut-set. 

To each point in the cut set we associate an element in the abstract domain, parame- 
terized by a number of variables. The values of these variables for all points in the cut-set 
defines an invariant candidate. Since paths between elements of the cut sets cannot contain 
a cycle, their denotational semantics can be expressed simply by an existentially quantified 
formula. Possible paths between each source and destination elements in the cut-set defined 
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a stability condition (Formula 2J) . The conjunction of all these stability conditions defines 
acceptable inductive invariants. As above, the least inductive invariant is obtained by writing 
a minimization formula (Sec. 13.4]) . 
Let us take a simple example: 

i=0; 

while (true) { /* A */ 
if (choiceQ) { 

j=0; 

while (j < i) { /* B */ 
/* something */ 

j-j+i; 

} 

i=i+l; 

if (i==20) { 
i=0; 

} 

} else { 

/* something */ 

} 

> 

We choose program points A and B as cut-set. At program point A, we look for an 
invariant of the form I A (i,j) = i m - mjA < i < ima,x,A, and at program point B, for an invariant 

Of the form lB(i,j) = imin,B < i < imax,B A jmin < j < j max A 5 min < i - j < 5 m ax (a 

difference- bound invariant). The (somewhat edited for brevity) stability formula is written: 

Vj I A (0, j) A ViVj ((Ib(», j) A j > i A [i + 1 < 19V 

i + 1 = 20 V i + 1 > 21)) U[i + 1 = 20, I A (0,j),I A (i + l,i)])A 
ViVj (I A (i,j) => J B (*,0)) AV^Vj ((Jb(»,j) A j < i) 

=>/ B (i,i + l)) (13) 

Replacing 1,4 and J B into this formula, then applying quantifier elimination, we obtain a 
formula defining all acceptable tuples (i min ,A, imax,A, imm,B, «max,s, jmin, jmax, ^min, ^max)- Op- 
timal values are then obtained by further quantifier elimination: i m i nj A = *min,s = jmm = 0, 

*max,A — *max,B — 19, jmax — 20, <5 mm — 1, <5max — 19. 

The same example can be solved by replacing 20 by another variable n as in Sec. 13.4.21 
5.2 Procedures and Recursive Procedures 

We have so far considered abstractions of program blocks with respect to sets of program 
states. A program block is considered as a transformer from a state of input program states 
to the corresponding set of output program states. The analysis outputs a sound and optimal 
(in a certain way) abstract transformer, mapping an abstract set of input states to an abstract 
set of output states. 
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Assuming there are no recursive procedures, procedure calls can be easily dealt with. We 
can simply in line the procedure a t the point of call, as done in e.g. Astree iBlanchet et al. 



2002 . 2003l | . Cousot et al. 2005 ]. Because inlining the concrete procedure may lead to 



code blowup, we may also inline its abstraction, considered as a nondeterministic program. 
Consider a complex procedure P with input variable x and output variable x. We ab- 
stract the procedure automatically with respect to the interval domain for the postcondition 
(m z < z < M z ); suppose we obtain M z := 1000; m z := x then we can replace the function call 
by z <= 1000 && z >= x. This is a form of modular interprocedural analysis: considering 
the call graph, we can abstract the leaf procedures, then those calling the leaf procedures and 
so on. This method is however insufficient for dealing with recursive procedures. 

In order to analyze recursive procedures, we need to abstract not sets of states, but sets 
of pairs of states, expressing the input-output relationships of procedures. In the case of 
recursive procedures, these relationships are the least solution of a system of e quations. 



To take a concrete example, l et us consider McCarthy's famous "91 function" iManna and McCarthy 
19691 ] . iManna and Pnuel] |l970f ]. which, non-obviously, returns 91 for all inputs less than 101: 



int M(int n) { 
if (n > 100) { 

return n-10; 
} else { 

return M(M(n+ll)) ; 

} 

> 

The concrete semantics of that function is a relationship R between its input n and its 
output r. It is the least solution of 



R D {(n,r) G 1? I (n > 100 Ar 



n - 10)V 

(n < 100 A 3n 2 G 



(n + 11, n 2 ) G R A (n 2 ,r) G R)} (14) 



We look for a inductive invariant of the form I = ((n > A) A (r — n > 5) A (r — n < 
A)) V ((n < B) A (r = C)), a non-convex domain (Sec. I4.2|) . By replacing R by I into 
inclusion 1141 and by universal quantification over n,r, n 2 , we obtain the set of admissible 
parameters for invariants of this shape. By quantifier elimination, we obtain (C = 91) A (5 = 
A = — 10) A (A = 101) A (B = 100) within a fraction of a second using Mjollnir (see Sec. [6]). 

In this case, there is a single acceptable inductive invariant of the suggested shape. In 
general, there may be parameters to optimize, as explained in Sec. I3.4L The result of this 
analysis is therefore a map from parameters defining sets of states to parameters defining sets 
of pairs of states (the abstraction of a transition relation). This abstract transition relation 
(a subset of X x Y where X and Y are the input and output state sets) can be transformed 
into an abstract transformer in — > Y* as explained in Sec. 13.21 Suc h an i nterprocedural 
analysis may also be used to enhance the analysis of loops iMartin et al. I |l998( |. 
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6 Implementations and Experiments 



We have implemented the techniques of Sec. [3] in quantifier elimination packages, including 
Ma thematicaR and Reduce 3.cH + Redlog^I in addition to our own package, Mjoll- 
NIR iMonniauxl [2008al ] Fl 



Monniaux 



As test cases, we took a library of operators for synchronous programming, having streams 
of floating-point values as input and outputs. These operators ar e written in a restrict ed subset 



of C and take as much as 20 lines. A front-end based on CIL iNecula et al.l [20021 ] converts 
them into formulas, then these formulas are processed and the corresponding abstract transfer 
functions are pretty-printed. Since for our application, it is important to bound numerical 
quantities, we chose the interval domain. 

For instance, the rate limiter presented in Sec. 13.4.31 was extracted from that library. Since 
this operator includes a memory (a variable whose value is retained from a call to the operator 
to the next one), its data-flow semantics is expressed using a fixed-point. When considered 
with real variables, the resulting expanded formula was approximately 1000 characters long, 
and with floating point variables approximately 8000 characters long. Despite the length of 
these formulas, they can be processed by Mjollnir in a matter of seconds. The result can 
then be saved once and for all. 



Analyzers such as Astree iBlanchet etHI [20021 . l2003j ]. ICousot et aD |2005l ] must have 



special knowledge about such operators, otherwise the analysis results are too coarse (for 
instance, the intervals do not get stabilized at all). The Astree development team there- 
fore had to provide specialized, hand-written analyzes. In contrast, all linear floating-point 
operators in the library were analyzed within a fraction of a second using the method in the 
present article, assuming that floating-point values in the source code were real numbers. If 
one considered instead the abstraction of floating-point computations using real numbers from 
Sec. 14. A\ computation times did not exceed 17 seconds per operator; the formulas produced 
are considerably more complex than in the real case. Note that this computation is done once 
and for all for each operator; a static analyzer can therefore cache this information for fur- 
ther use and need not recompute abstractions for library functions or operators unless these 
functions are updated. 

Our analyzer front-end currently cannot deal with non-numerical operations and data 
structures (pointers, records, and arrays). It is therefore not yet capable of directly dealing 
with the real control-command programs that e.g. Astree accepts, which do not consist 
purely of numerical operators. We plan to integrate our analysis method into a more generic 
analyzer. Alternatively, we plan to adapt a front-end for synchronous programming languages 
such as Simulink, a tool widely used by control/command engineers. 

The correctness of the methods described in this article does not rely on any particularity 
of the quantifier elimination procedure used, provided one also has symbolic computation 
procedures for e.g. putting formulas in disjunctive normal form and simplifying them. The 
difference between the various quantifier elimination and simplification procedures is efficiency; 
experiments showed that ours was vastly more efficient than the others tested for this kind of 



4 http : //www. wolf ram. com/ 

5 http : //www . uni-koeln . de/REDUCE/ 



http : //www. algebra, fim.uni-passau.de/~redlog/ 



Source code and GNU/Linux/IA32 binaries of this implementation are available from 
http : //www-verimag. imag . f r/~monniaux/download/automatic_abstraction. zip 
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application. For instance, our implementation was able to complete the analysis of the rate 
limiter of Sec. 13.4.31 implemented over the reals, in 1.4 s, and in 17 s with the same example 
over floating-point numbers, while Redlog took 182 s for the former and could not finish the 
latter, and Mathematica could analyze neither (out-of- memory). On other examples, our 
quantifier elimination procedure is faster than the other ones, or can complete eliminations 
that the others cannot iMonniauxi 2008a]. 



7 Related Works 



There is a sizeable amount of literature concerning relational numerical abstract domains; that 
is, domains that e xpress constraints between numerical variables. C onvex polyhedra were pro- 
posed in the 1970s lHalbwachsl |l979| | , ICousot and Halbwachsl Il978 | , and there have been since 



then many improvements to the technique; a bibliography was gathered by iBagnara et al 



20061 ] . Algorithms on polyhedra are costly and thus a variet y of domains intermediate between 



simpl e interval analysis and convex po lyhedra were proposed lMinel 20011 ] JClariso and Cortadella 



c 2004i ] , ISankara naravanan et al.l 1200511 . All these domains comput e invariants using a widenin g 
operator Cousot and Cousot 19761 ] . Cousot and Halbwachs 1978 ]. Cousot and Cousot 19921 ] . 



There is, however, no guarantee that the resulting invariant is the best representable in the 
abstract domain, even with the use of narrowing iterations; this is one difference with our 
proposal, which computes the best representable inductive invariant. 

Another difference is that these domains are designed to work with numerical values for the 
input constraints, thus the computation must be done for every value of the input constraints 
parameters. Using simple program transformations, they may also apply to symbolic input 
constraints (constraint parameters being taken as extra variables), but in general this will lead 
to bad results; for instance, the input-output relationship for the rate limiter of Sec. 13.4.31 is 
not convex, while numerical abstract domains in the literature are convex. In comparison the 
algorithm in this article can be run once to obtain a formula that gives the best invariant 
depending on the input constraints, allowing modular analysis. 

Sev eral methods have b een proposed t o synthesize invariants wi t hout using widening op- 



erators Colon et al. 20031 ] . Cousot 2005 ]. Sankaranarayanan et al. 2004 ]. In common with 



us, they express as constraints the conditions under which some parametric invariant shape 
truly is an invariant, then they use some resolution or simplification technique over those 
constraints. Again, these methods are designed for solving the problem for one given set of 
constraints on the inputs, as opposed to finding a relation between the output or fixed-point 
con straints and the input constraints. In some cases, the invariant may also not be minimal. 



Bagnara et al.l 2005a||b| propos e d imp r ovements over the "clas sical" widenings on lin 



Halbwachs] 1979 ]. Gopan and Repsl 2006 ] introduced "lookahead 



ear constraint domains 
widenings": standard widening-based analysis is applied to a sequence of syntactic restric- 
tions of the original program, which ultimately converges to the whole programs; the idea 
is to distinguish phases or modes o f operation in order to make the widening more pre- 
cise. Gonnord and Halbwachs! 2006 ] have proposed acceleration techniques for linear con- 



straints. These do not replace widenings altogether, but they alleviate the need for some of 
the costly workarounds to the imprecision introduced by widenings, such as delayed widen- 
ing 



Blanchet et~ail [2003, Sec. 7.1.3]. These address a different problem from ours. On the 



one hand, neither improved widenings nor acceleration guarantee that the inductive invariant 
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obtained at the end is the least one (indeed, they can yield the top element T). H Further- 
more, the invariant that these methods obtain is not parametric in the precondition, contrary 
to the one that our method obtains. On the other hand, improved widenings work regardless 
of the form of the transition relation, which our method constrains to be piecewise linear. 
Some of the cited methods operate on general polyhedra, while our method constrains the 
sha pe of the polyhedra t hat are found to a certain template. 



Gaubert et al.l [20071 ] . iGawlitza and Seidll [20071 ] proposed replacing the usual widening/narrowing 



iteration techniques by a policy iteration (or strategy iteration) approach. Their approach con- 
verges on a fixed point, but not necessarily the least one. Their idea is to replace computing 
the least fixed point of a complex abstract operator (the point-wise minimum of a family of 
simpler operators) by a sequence of least fixed point computations for these simple operators. 
Their technique anyway needs to compute these latter least fixed points, and it is possible 
that our method can help in that respect. 

Techniques using quantifier elimination for gene rating nonlin ear invariants for programs 
using nonlinear arithmetic have also been proposed iKapurl 20041] and shown ca pable of pro- 
ducing optimal invariants parameterized by input constraints iMonniauxl 20071 ] . Quantifier 
elimination in the theory of real closed fields is, however, a very costly technique. Experimen- 
tally, the formulas generated by common implementations tend to grow huge (due to difficult 
simplifications) and both time and space requirements grow very fast with the number of 
var iables. This is why w e considered the linear case in the present article. 



Gulwani et al.l 2008] have also proposed a method for generating linear invariants over 



integer variables, using a class of templates. The methods described in the present article can 
be applied to linear invariants over integer variables in two ways: either by abstracting them 
using rationals (as in examples in Sec. 13.4.21 15. ip . either by replacing quantifier elimination 
over rational linear arithmetic by quantifier elimination over linear integer arithmetic, also 
known as Pres burger arithmetic. Quant ifi er elimination o ver Presburger arithmetic is however 
very expensive iFischer and Rabinl 19741 ] . iGulwani et al.l instead chose to first consider integer 
variables as rationals, so as to be able to compute over rational convex polyhedra, then bound 
variables and constraint parameters so as to model them as finite bit vectors, finally obtaining 
a problem amenable to SAT solving. Program variables are finite bit vectors in most industrial 
programming languages, and parameters to useful invariants over integer variables are often 
small, thus their approach seems justified. We do not see, however, how their method could 
be applied to programs operating over real or floating-point variables, which are the main 
motivation for the present article. 



The idea of producing procedure sum maries ISharir an d Pnuelil 19811 ] as formulas mapping 
input bounds to output bounds is not new. iRugina and Rinardl 20051 ]. in the context of pointer 
analysis (with pointers considered as a base plus an integer offset), proposed a reduction to 
linear programming. This reduction step, while sound, introduces an imprecision that is 
difficult to m easure in advance; ou r method, in contrast, is guaranteed to be "optimal" in a 



certain sense. 



Rugina and R inard's method, h owever, allows some nonlinear constructs in the 



program to be analyzed. Martin et al. 1998| proposed applying interprocedural analysis to 
loops. 



Seidl et al.l 20071 ] also produce procedure summaries as numerical constraints. Our pro- 



cedure summaries are implementations of the corresponding abstract transformer over some 



There exist exact acceleration techniques but these rather apply to discrete automata. 
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abstract domain, while theirs outputs a relationship between input and output concrete val- 
ues. Their analysis considers a convex set of concrete input-output relationships, expressed 
as a simplices, a restricted class of convex polyhedra. This restriction trades precision for 
speed: the generator and constraint representations of simplices have approximately the same 
size, while in general polyhedra exponential blowup can occur. Tes ts by arbitrary lin ear con- 



straints cannot be adequately represented within this framework. ISeidl et al.l 20071 . Sec. 4] 



propose deferring those constraints using auxiliary variables; this, however, loses some preci- 
sion. Their analysis and ours are therefore incomparable, since they make different choices 
bet ween prec i sion a nd efficiency 



Lai et al 



2005] proposed an interprocedural analysis of numerical properties of functions 
using weighted pushdown automata. The "weights" are taken in a finite height abstract 
domain, while the domains we consider have infinite height. 

In earlier works we have proposed a method for obtaining input-output relationships of 
digit al linear filters wi th memories, taking into account the effects of floating-point computa- 
tions |Mojmiaux| 20051 ] . This method computes an exact relationship between bounds on the 



input and bounds on the output, without the need for an abstract domain for expressing the 
local invariant; as such, for this class of problems, it is more precise than the method from 
this article. This technique, however, cannot be easily generalized to cases where the operator 
block contains tests. 



8 Conclusion and Future Work 

Writing static analyzers by hand has long been found tedious and error-prone. One may of 
course prove an existing analyzer correct through assisted proof techniques, which removes 
the possibility of soundness mistakes, at the expense of much increased tediousness. In this 
article, we proposed instead effective methods to synthesize abstract domains by automatic 
techniques. The advantages are twofold: new domains can be created much more easily, since 
no programming is involved; a single procedure, testable on independent examples, needs 
be written and possibly formally proved correct. To our knowledge, this is the first effective 
proposal for generating numerical abstract domains automatically, and one of the few methods 
for generating numerical summaries. Also, it is also the only method so far for computing 
summaries of floating-point functions. 

We have shown that floating-point computations could be safely abstracted using our 
method. The formulas produced are however fairly complex in this case, and we suspect 
that further over-approximation could dramatically reduce their size. There is also nowadays 
significant interest in automatizing, at least partially, the tedious proofs that computer arith- 
metic experts do and we think that the kind of methods described in this article could help 
in that respect. 

We have so far experimented with small examples, because the original goal of this work 
was the automatic, on-the-fly, synthesis of abstract transfer functions for small sequences 
of code that could be more precise than the usual composition of abstract of individual 
instructions, and less tedious for the analysis designer than the method of pattern-matching 
the code for "known" operators with known mathematical properties. A further goal is the 
precise analysis of longer sequences, including integer and Boolean computations. We have 
shown in Sec. 14.31 how it was possible to partition the state space and abstract each region of 
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the state-space separately; but naive partitioning according to n Booleans leads to 2 n regions, 
which can be unbearably costly and i s unneeded in most cases. We think that automatic 
refinement and partitioning techniques Jeannet 2003] could be developed in that respect. 
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